Prepare and Optimize Data Sets for Data Mining Analysis

نویسنده

Bhaskar N

چکیده

Getting ready a data set for examination is usually the tedious errand in a data mining task, needing numerous complex SQL queries, joining tables and conglomerating sections. Existing SQL aggregations have limitations to get ready data sets since they give back one section for every amassed bunch. As a rule, a significant manual exertion is obliged to construct data sets, where a horizontal layout is needed. We propose straightforward, yet effective methods to generate SQL code to return totaled sections in a horizontal even layout, giving back a set of numbers rather than one number for every line. This new class of functions is called horizontal aggregations. Horizontal aggregations construct data sets with a horizontal denormalized layout, which is the standard layout needed by most data mining algorithms. We propose three basic strategies to evaluate horizontal aggregations: CASE: Exploiting the programming CASE construct; SPJ: Based on standard relational algebra operators (SPJ queries); PIVOT: Using the PIVOT operator, which is offered by some DBMSs. Horizontal aggregations results in large volumes of data sets which are then partitioned into homogeneous clusters is important in the system. This can be performed by K Means Clustering Algorithm Keywords— aggregation; data preparation; pivoting; clustering

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploration of Kahang porphyry copper deposit using advanced integration of geological, remote sensing, geochemical, and magnetics data

The purpose of mineral exploration is to find ore deposits. The main aim of this work is to use the fuzzy inference system to integrate the exploration layers including the geological, remote sensing, geochemical, and magnetic data. The studied area was the porphyry copper deposit of the Kahang area in the preliminary stage of exploration. Overlaying of rock units and tectonic layers were used ...

متن کامل

Applying a decision support system for accident analysis by using data mining approach: A case study on one of the Iranian manufactures

Uncertain and stochastic states have been always taken into consideration in the fields of risk management and accident, like other fields of industrial engineering, and have made decision making difficult and complicated for managers in corrective action selection and control measure approach. In this research, huge data sets of the accidents of a manufacturing and industrial unit have been st...

متن کامل

Comparing Medical Comorbidities Between Opioid and Cocaine Users: A Data Mining Approach

Background: Prescription drug monitoring programs (PDMPs) are instrumental in controlling opioid misuse,but opioid users have increasingly shifted to cocaine, creating a different set of medical problems. Whileopioid use results in multiple medical comorbidities, findings of the existing studies reported singlecomorbidities rather...

متن کامل

On Mining Fuzzy Classification Rules for Imbalanced Data

Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...

متن کامل

On Mining Fuzzy Classification Rules for Imbalanced Data

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Prepare and Optimize Data Sets for Data Mining Analysis

نویسنده

چکیده

منابع مشابه

Exploration of Kahang porphyry copper deposit using advanced integration of geological, remote sensing, geochemical, and magnetics data

Applying a decision support system for accident analysis by using data mining approach: A case study on one of the Iranian manufactures

Comparing Medical Comorbidities Between Opioid and Cocaine Users: A Data Mining Approach

On Mining Fuzzy Classification Rules for Imbalanced Data

On Mining Fuzzy Classification Rules for Imbalanced Data

عنوان ژورنال:

اشتراک گذاری